Identifying Interesting Instances for Probabilistic Skylines

نویسندگان

  • Yinian Qi
  • Mikhail J. Atallah
چکیده

Uncertain data arises from various applications such as sensor networks, scientific data management, data integration, and location based applications. While significant research efforts have been dedicated to modeling, managing and querying uncertain data, advanced analysis of uncertain data is still in its early stages. In this paper, we focus on skyline analysis of uncertain data, modeled as uncertain objects with probability distributions over a set of possible values called instances. Computing the exact skyline probabilities of instances is expensive, and unnecessary when the user is only interested in instances with skyline probabilities over a certain threshold. We propose two filtering schemes for this case: a preliminary scheme that bounds an instance’s skyline probability for filtering, and an elaborate scheme that uses an instance’s bounds to filter other instances based on the dominance relationship. We identify applications where instance-level filtering is useful and desirable. Our algorithms can be easily adapted to filter at the object level if the application domain requires it. Moreover, the uncertain model we adopt in this paper allows missing probabilities of uncertain objects as well as arbitrary probability distributions over instances. We experimentally demonstrate the effectiveness of our filtering schemes on both the real NBA data set and the synthetic data set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Skylines on Uncertain Data

Uncertain data are inherent in some important applications. Although a considerable amount of research has been dedicated to modeling uncertain data and answering some types of queries on uncertain data, how to conduct advanced analysis on uncertain data remains an open problem at large. In this paper, we tackle the problem of skyline analysis on uncertain data. We propose a novel probabilistic...

متن کامل

Continuous Probabilistic Skyline Queries over Uncertain Data Streams

Recently, some approaches of finding probabilistic skylines on uncertain data have been proposed. In these approaches, a data object is composed of instances, each associated with a probability. The probabilistic skyline is then defined as a set of non-dominated objects with probabilities exceeding or equaling a given threshold. In many applications, data are generated as a form of continuous d...

متن کامل

Semi-Skylines and Skyline-Snippets

Skyline evaluation techniques (also known as Pareto preference queries) follow a common paradigm that eliminates data elements by finding other elements in the data set that dominate them. To date already a variety of sophisticated skyline evaluation techniques are known, hence skylines are considered a well researched area. Nevertheless, in this paper we come up with interesting new aspects. O...

متن کامل

UNIVERSITÄT AUGSBURG Semi-Skylines and Skyline Snippets

Skyline evaluation techniques (also known as Pareto preference queries) follow a common paradigm that eliminates data elements by finding other elements in the data set that dominate them. To date already a variety of sophisticated skyline evaluation techniques are known, hence skylines are considered a well researched area. Nevertheless, in this paper we come up with interesting new aspects. O...

متن کامل

Efficient Skyline Computation in MapReduce

Skyline queries are useful for finding interesting tuples from a large data set according to multiple criteria. The sizes of data sets are constantly increasing and the architecture of back-ends are switching from single-node environments to non-conventional paradigms like MapReduce. Despite the usefulness of skyline queries, existing works on skyline computation in MapReduce do not take full a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010